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ABSTRACT 

Interactions between proteins are highly conserved 
across species. As a result, the molecular basis of 
multiple diseases affecting humans can be studied 
in model organisms that offer many alternative 
experimental opportunities. One such organism — 
Caenorhabditis elegans— has been used to produce 
much molecular quantitative genetics and systems 
biology data over the past decade. We present 
WormQTL HD (Human Disease), a database that quan- 
titatively and systematically links expression 
Quantitative Trait Loci (eQTL) findings in C. elegans 
to gene-disease associations in man. WormQTL HD , 
available online at http://www.wormqtl-hd.org, is a 
user-friendly set of tools to reveal functionally 
coherent, evolutionary conserved gene networks. 
These can be used to predict novel gene-to-gene as- 
sociations and the functions of genes underlying the 
disease of interest. We created a new database that 
links C. elegans eQTL data sets to human diseases 
(34337 gene-disease associations from OMIM, DGA, 
GWAS Central and NHGRI GWAS Catalogue) based 
on overlapping sets of orthologous genes associated 
to phenotypes in these two species. We utilized QTL 
results, high-throughput molecular phenotypes, 
classical phenotypes and genotype data covering 
different developmental stages and environments 
from WormQTL database. All software is available 
as open source, built on MOLGENIS and xQTL 
workbench. 



INTRODUCTION 

Many exciting data sets have been collected in recent years 
for Caenorhabditis elegans, a free-living, non-parasitic soil- 
related nematode that feeds on the bacteria of decaying 
organic matter. This worm has many useful features that 
have made it one of the most studied model organisms: it 
is small and easy to house, has a short generation time and 
is transparent. As a consequence, its genomic information 
is now available (1), and the developmental path and 
function of almost every cell in its body has been described 
(2). In addition, recent genetical genomics studies in 
C. elegans have revealed thousands of genomic regions 
(loci) that are associated to the quantitative variation in 
a diverse range of phenotypes, such as gene expression 
[expression Quantitative Trait Loci (eQTLs)] (3-9), 
lifespan (10), development (11-13), stress resistance 
(14,15), behaviour (16,17), dauer formation (15,18) and 
sensitivity to RNAi treatments (19). 

Genes having eQTLs mapping to the same genomic 
region (i.e. hotspot) are possibly involved in the same bio- 
logical pathway/process. Palopoli et al. (5) have shown 
that biochemical processes and molecular functions of 
genes are generally highly conserved. Lee et al. (20) have 
shown that using the OMIM database (21) (http://omim. 
org/) and orthologue mapping data from INPARANOID 
(22), it is possible to infer new gene-gene interactions that 
are responsible for a certain disease in man from model 
organism data. McGary et al. (23) have shown that the 
conservation level between C. elegans and man is sufficient 
to infer gene-gene interactions in man from worm data. 
Even though the global disease phenotypes may not be 
at all comparable, the molecular basis may be common 
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(e.g. breast cancer and high male incidence of progeny). 
For example, research on stress response in C. elegans has 
provided detailed insight into the genetic and molecular 
mechanisms underlying complex human diseases (24). In 
addition, Shaye and Greenwald (25) have generated a 
compendium of C. elegans genes with human orthologues 
using four orthology prediction programmes for identify- 
ing C. elegans orthologues of human disease genes 
for potential functional analysis. As a result, linking 
C. elegans and human data could help to understand the 
mechanisms underlying many human diseases. 

To facilitate the exploitation of the worm eQTL data 
for human disease research we developed a new database, 
WormQTL HD , which quantitatively and systematically 
links many eQTLs findings in C. elegans to gene-disease 
associations in human. The database is based on the de- 
tection of the overlapping sets of orthologous genes 
associated with different phenotypes, or 'phenologs' (23) 
between these two species. The data, mainly eQTL results, 
were taken from different platforms (e.g. Agilent) and ex- 
periments (e.g. developmental stages). We provide a set of 
web-based analysis tools to search the database and 
explore phenotypes based on gene orthologues between 
worm and man. The result can be downloaded and 
visualized in a comprehensive yet clear way. All data 
and tools can be accessed via a public web user interface, 
as well as basic programming interfaces, which were built 
using the MOLGENIS biosoftware toolkit (26). 

To our knowledge, this is the first online database for 
the systematic investigation of C. elegans phenotype 
equivalents of human diseases by integrating known 
disease-gene associations, gene orthologue data, molecu- 
lar phenotypes and QTL results. WormQTL HD allows re- 
searchers to explore these complex data in a user-friendly 
way, finding new genes, interactions and loci for human 
disease models. 

WormQTL HD is freely accessible without registration 
and is hosted at http://www.wormqtl-hd.org. All 
underlying software is open source and can be down- 
loaded and freely used, for example, as a local mirror of 
the database and/or to host new studies, which can be 
uploaded using XGAP format (27). Below we describe 
the results, methods used to implement the system and 
future plans. 



IMPLEMENTATION 

WormQTL HD was compiled using data from six sources 
that are described below: (I) WormQTL (24,28), (II) 
WormBase Phenotypes (29), (III) Online Mendelian 
Inheritance in Man (OMIM) (21), (IV) The Disease and 
Gene Annotations (DGA) (30), (V) NHGRI GWAS 
Catalogue (31) (http://www.genome.gov/gwastudies) and 
(VI) GWAS Central (32,33) (Figure 1). (I) WormQTL 
(http://www.wormqtl.org) contains many published 'gen- 
etical genomics' experiments and consists of 47 public data 
sets with eQTL data on 500 panels (Recombinant Inbred 
Lines or natural strains), 68452 microarray probes, 1630 
samples and 1579 markers. The tools that were present in 
WormQTL, such as the QTL Finder and the Genome 



Your 
source? 




Symbols 




Gene ID 




HGNC 




ENSG 





Protein ID 

ENSP 




Gene expr. 




Probes 




Gene ID 




Protein ID 


WormQTL 




WormQTL 




WBGene 




WP:CE 



QTLs 

WormQTL 



Phenotypes 

WormBase 



Figure 1. Human and worm data integration. WormQTL was 
compiled using data derived from WormQTL, WormBase, OMIM, 
DGA, GWAS Catalogue and GWAS Central. 



Browser, are also available in WormQTL HL) . (II) 
WormBase is 'an international consortium of biologists 
and computer scientists dedicated to providing the 
research community with accurate, current, accessible in- 
formation concerning the genetics, genomics and biology 
of C. elegans and related nematodes' (WormBase Mission 
statement, Todd Harris, 26 November 2012). From 
WormBase, we downloaded all the gene-phenotype asso- 
ciations (total 227 216) via WormMart. (Ill) OMIM is one 
of the most popular databases containing 14164 human 
gene-disease associations. (IV) The DGA database (2961 
associations) was started in 2013 and claims to be more 
comprehensive than OMIM. (V) The NHGRI GWAS 
Catalogue is a collection of 12 925 SNP-to-disease associ- 
ations published in GWAS studies with at least 100000 
assayed SNPs and a P-value of <1.0x 10" 5 . The SNPs 
were linked to human genes by the authors of the 
original papers that have been included in the catalogue. 
(VI) GWAS Central (32-34) is a database of summary level 
findings from genetic association studies. The authors of 
GWAS Central gathered and curated many datasets from 
public domain projects, and supplied us with a list of 
4487 gene-disease associations having a P-value of 
<1.0 x 10~ 10 . Because of the non-overlapping information 
in these four sources of human genes linked to disease, they 
are all provided and can be selected by the user. Human 
and worm data are connected based on the detection of 
orthologous genes in these two species. We downloaded 
all INPARANOID orthologues between C. elegans and 
Homo sapiens with a 100% bootstrap value. The bootstrap 
value indicates how often the pair is found as 
reciprocally best matched in a sampling with a replacement 
procedure that was applied to the original Blast alignment. 

To explore this database, WormQTL™ features four 
major searching tools for different purposes. The starting 
points are summarized in Figure 2 and described in detail 
below, followed by a short summary of the software used. 
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Figure 2. Cross-experiment search. WormQTL provides four tools to explore the database: mapping human diseases to worm QTLs 
(Disease2QTL); mapping a worm genomic region to human diseases (Region2disease); mapping worm QTLs to human diseases (QTL2disease); 
and linking worm phenotypes to human diseases (ComparePheno). 



Tool 1: 'Disease2QTL', mapping human diseases to 
worm eQTLs 

Exploring the genetic variation data for human gene 
orthologues in worm can provide useful insight into the 
function and regulation of human diseases. WormQTL HD 
provides a tool for human geneticists to explore novel 
causal genes for a specific human disease by using worm 
QTL findings. Using a selection of one or multiple human 
diseases (from OMIM, DGA, NHGRI GWAS Catalogue 
or GWAS Central), a 'shopping' page is presented with 
worm gene expression probes and their human disease 
association. More information about the gene orthology 
mapping and association studies can be browsed. Users 
can put individual probes, or all probes at once, into the 
'shopping cart'. Subsequently, they can explore the genetic 
variation of those genes across the different experiments 
and studies that are stored in the WormQTL™ database. 
The shopping cart is a central place in WormQTL HD 
where users can see the various worm gene probes that 
they have selected, and create QTL/eQTL visualizations 
from the items in the shopping cart using 'Plot QTLs'. 



Using the 'Plot QTLs' function, researchers can test if 
genes associated with the selected diseases have any QTLs 
and if they map to a common genomic region. Shared 
QTLs suggest that those genes are regulated by the same 
genetic variation and are possibly involved in the same 
biological pathways. The genes with c/.y-QTLs in that 
genomic region are used as candidate genes in several 
types of studies (35-37). The same approach can be used 
for causal genes of human diseases. Alternatively, users 
can also select worm phenotypes (1504 total) instead of 
human diseases as a starting point. The shopping window 
is presented in exactly the same way as before, so users can 
browse human diseases from a worm phenotype perspec- 
tive instead, or simply shop for probes of choice for a 
given worm phenotype and plot their QTLs, without 
considering any human disease relation. 

Tool 2: 'Region2disease', mapping worm genomic regions 
to human diseases 

Researchers can link worm genomic regions to human 
diseases. This approach starts by selecting a region in 
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the worm genome, e.g. a known 'eQTL hotspot', where a 
number of eQTLs are located. The region is selected by 
providing the chromosome name, start and end base pair 
positions. Users can quickly define a region of interest by 
using the location of any C. elegans gene. The database 
then returns all worm gene expression probes that are 
annotated in this region. From the probes, the corres- 
ponding worm genes are gathered, plus their human 
orthologues. The user is presented with a table containing 
the human-worm orthology and disease/phenotype asso- 
ciations in man and worm. After shopping for some or all 
of the relevant probes, users can choose to visualize eQTL 
results for them (similar to Tool 1), or perform a disease 
enrichment test. 

The hypergeometric gene overlap test (38) to discover 
phenologs (phenotype orthologues) can be performed by 
clicking on 'Disease enrichment'. All probes in the region 
are linked to their corresponding genes in worm, and a test 
is performed whether this entire group of genes is signifi- 
cantly 'enriched' for one or more human diseases by 
overlapping orthologous groups and worm and human 
genes. The statistical significance of phenologs (P-value) 
is listed in an output table. A significant result means that 
the input genomic region shares a significantly larger set of 
orthologous genes with a human disease than would be 
expected at random, even if the expressed phenotype in 
worm appears very different from the human disease 
phenotype (e.g. breast cancer and fertility). This tool can 
provide novel interpretation of genomic regions of 
interest. 

Tool 3: 'QTL2disease', mapping worm QTLs to human 
diseases 

Researchers can start by selecting a QTL/eQTL in worm 
to find potential relationships with human diseases. We 
can select QTLs of interest based on three criteria: a 
selected experiment, a certain threshold for significance 
(LOD score) and a specific gene expression probe with a 
suspected QTL. If there is a QTL with a LOD score above 
the threshold, we automatically select the closest 50 probes 
on both sides of the highest peak marker. These probes are 
presented and available for browsing, shopping and 
plotting of QTLs, or can be the input for the disease 
enrichment test to find phenologs. 

Tool 4: 'ComparePheno', linking worm phenotypes to 
human diseases 

WormQTL HD also provides a tool that links human 
diseases to classical worm phenotypes (and vice versa) to 
discover phenologs in a systematic way. Users begin by 
selecting one or more human diseases and clicking on 
'Compare'. The genes associated with the selected 
disease are tested for enrichment against all sets of 
known associated genes for worm phenotypes. The 
result reveals functionally coherent, evolutionarily 
conserved gene networks. 

Alternatively, users can also start by selecting worm 
phenotypes, which are tested against human diseases. In 
addition to cross-species testing, results of within-species 
disease enrichment are also available (e.g. to find the 



closest related human disease for another input human 
disease). 

Software used 

All the software has been implemented using the open 
source 'MOLecular GENetics Information Systems' — 
MOLGENIS— toolkit (26). The MOLGENIS toolkit is 
Java-based software to generate tailored research infra- 
structure on-demand (39). In particular, we built on an 
existing MOLGENIS application, the extensible xQTL 
workbench (40) and the R/qtl QTL mapping and visual- 
ization package for the R language (41,42). All software is 
available as open source on http://github.com/molgenis 
for others to reuse locally. Related technical documenta- 
tion is available at http://www.xqtl.org, http://www.rqtl. 
org and http://www.molgenis.org. 

RESULTS 

To demonstrate the added value of WormQTL HD , we 
have reproduced findings from known studies and have 
shown that novel insights and hypotheses can be 
achieved with little time and effort. Subsequently, we per- 
formed a broad-sweep disease-enrichment test to find all 
non-evident phenologs and to explore which new putative 
candidate genes for human diseases could be elucidated 
for future research. 

Case 1: Linking disease to worm phenotype from 
McGary et al. (23) 

McGary et al. performed a phenolog mapping between the 
high incidence of male C. elegans progeny to human 
breast/ovarian cancers. Of 4649 total orthologues, 
McGary et al. reported 3 overlapping genes of 12 
human disease-associated genes and 16 worm pheno- 
type-associated genes — which is a significant enrichment 
(hypergeometric test P-value of <7.2 x 10~ 6 ). From the 13 
worm phenotype-associated genes that were not 
overlapping, 9 had orthologues that had already been 
linked to breast cancer in the primary literature. They 
implicated the remaining four genes as new breast 
cancer candidates. We replicated these findings using the 
ComparePheno tool of WormQTL HD , searching for the 
WormBase phenotype 'high incidence male progeny'. The 
first human disease among the results is '{Breast cancer, 
susceptibility to}, 114480 (3)' from OMIM. Our tool 
reported 2 overlapping genes of 4 human disease- 
associated genes and 63 genes from the worm phenotype. 
This resulted in a P-value <1.4 x 10~ 3 (uncorrected). The 
second best human hit in the results is 'malignant 
neoplasm of ovary' from DGA. We found two 
overlapping genes of six ovarian cancer associated genes, 
resulting in a /"-value <3.41 x 10~ 3 (uncorrected). 
ComparePheno also indicated enrichment of these 
categories. The P-values are 'less significant' than 
McGary et al. because (i) their definition of 'high inci- 
dence male progeny' included only 16 rather than 63 
genes and (ii) they used an older INPARANOID 
version, so the overlap test was performed on a different 
orthologue mapping. Together, these results from our 
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database do indeed replicate their findings. See Online Use 
Case 1 on the Help page to repeat this case. 

Case 2: Worm eQTL hotspot from two temperature 
expression data from Li et al. (43) 

Li et al. (43) found an eQTL hotspot (77.56 Mb on 
chromosome V) on the worm genome in which genetic 
variation is associated with the expression of 66 genes, 
while these genes are located elsewhere on the genome. 
This indicates that these genes are possibly involved in 
the same biological process/pathway and potentially 
share a regulatory element. They may be physically 
located on the eQTL hotspot, which controls gene 
expression responding to different ambient temperatures. 
First, we used the Region2disease tool and input positions 
ChrV: 15430739-1 6430739 (a non-cumulative 1 Mb region 
around the hotspot). We put all 931 probes located in this 
region in the shopping cart, and selected 'Disease enrich- 
ment'. The best hit was 'Response to antineoplastic 
agents' (agents used in chemotherapeutic treatment of 
cancer) from GWAS Catalogue (P- value <4.92 x 10" 3 , 
uncorrected). For this hit, the associated human gene, 
PPP2R5E, is orthologous to WBGeneOOO 12348 (pptr-1) 
present in this region. The best WormBase hit is 
'thermotolerance increased' (P-value <1.5 x 10~ 2 , uncor- 
rected), also via association with pptr-1. Padmanabhan 
et al. (44) showed that pptr-1 is involved in regulating 
subcellular localization and transcriptional activity of 
the forkhead transcription factor daf-16. Rodriguez et al. 
(24) reviewed the role of heat stress response experiments 
in C. elegans for detecting human disease genes. They 
reported that daf-16 in worms controls lifespan and 
stress response. In humans, the daf-16 orthologue 
FOX03A is associated with aging and prevalence of 
cancer (45). Using the Disease2QTL tool, a search for 
'Response to antineoplastic agents' results in six probes 
for orthologues of PPP2R5E (WBGeneOOO 12348) and 
ACOX3 (WBGeneOOO 19060). We selected them all and 
plotted the QTLs. This revealed a highly significant 
(LOD > 50) c«-eQTL for pptr-1 in the Rockman et al. 
(9) dataset. Given all the evidence, we believe pptr-1 
might be an interesting candidate in the further develop- 
ment of a temperature-based C. elegans model for under- 
standing human cancer and developing potential 
therapeutic drugs. Moreover, it shows that combining 
the 'Region2disease' and 'Disease2QTL' tolls can lead to 
an interesting hypothesis ready for experimental valid- 
ation. See Online Use Case 2 on the Help page to repro- 
duce this case. 

Case 3: Osmotic stress as a model for Bardet-Biedl 
syndrome from Rodriguez et al. (24) 

Rodriguez et al. proposed hypertonic or osmotic stress in 
C. elegans as a model to study human diseases related to 
protein aggregation, such as Alzheimer's and Parkinson's. 
Hypertonic stress due to loss of water causes an intracel- 
lular ionic imbalance, which leads to rapid accumulation 
of organic osmotic glycerol and accumulation of damaged 
proteins. Shaye and Greenwald (25) showed that osm-12 
(associated with osmotic stress response) is orthologous to 



BBS7 in man, which is associated to Bardet-Biedl 
syndrome (46). We used the Disease2QTL tool to look 
for QTLs associated with Bardet-Biedl syndrome by 
selecting all 'Bardet-Biedl syndrome' entries (seven in 
total) from OMIM. When we plotted the QTLs in worm 
for these entries, three significant eQTLs (LOD > 5) were 
found for osm-12 (in cis), bbs-5 (also in cis) and bbs-2 (in 
trans). The strongest QTL (LOD > 6) was found for bbs-5, 
reported by probe AGIUSA3442 in the Rockman et al. 
dataset. We used the QTL2disease tool to investigate this 
QTL further. It revealed a nearby, very significant eQTL 
(LOD > 10) for a gene named T07C4.10, which can be 
investigated further as a potential candidate for this 
disease model. See Online Use Case 3 on the Help page 
to replicate this example. 

Novel disease-gene associations by 'broad-sweep' 
disease-enrichment test 

We performed hypergeometric gene overlap tests to find 
phenologs between all worm phenotypes versus all human 
diseases. Table 1 lists the 15 most significant hits for 
human diseases that have significant gene overlap 
with worm phenotypes (see Supplementary Table SI 
for the top 100). New candidate genes for human 
diseases can be discovered from phenologs by 
investigating human orthologues of worm genes that did 
not overlap with known human genes of the disease of 
interest. 

McGary et al. (23) reported 'Zellweger syndrome' in 
man to be a phenolog with 'Reduced number of 
peroxisomes' in yeast (P-value <1.0 x 10~ 9 ). Our best hit 
was 'Zellweger syndrome' with 'peroxisome physiology 
variant' in worm (P-value <3.6 x 10~ 10 ). Encouragingly, 
certain top hits such as 'coenzyme Q depleted' in 
worm versus 'Coenzyme Q10 deficiency' in man, and 
'spontaneous mutation rate increased' in worm 
versus 'Mismatch repair cancer syndrome' in man make 
sense, thereby validating this approach and adding 
credibility to potentially non-evident human disease 
models. 



DISCUSSION 

The current version of WormQTL HD (August 2013) is a 
comprehensive and compendious database that enables 
molecular model organism data to be studied in the 
context of human diseases. Just as with WormQTL (24), 
we believe that WormQTL HD will be continuously curated 
by the members of the C. elegans community. The results 
of the 'broad-sweep' disease-enrichment test in combin- 
ation with the web tool will be of special interest to re- 
searchers in the human or worm domain. We believe these 
results could also be applied to prioritize the pathogenic 
variants increasingly being produced by next-generation 
sequencing in diagnostic labs. Genetic variants affecting 
human genes of unknown function may have worm 
orthologues that are part of human-worm phenologs 
and these may reveal or imply a role in a human 
disease. Thus, through functionally conserved networks, 
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1 1 


2 


2 


6.67E-06 


Organism starvation response variant 


Ovarian cancer, somatic, 604370 (3) (OMIM) 


12 


2 


2 


8.00E-06 


Neuron development variant 


Diastolic blood pressure (GWAS Catalogue) 


17 


1 1 


3 


9.85E-06 


Ventral closure defective 


Wiskott-Aldrich syndrome (DGA) 


8 


3 


2 


1.02E-05 


Egg laying imipramine resistant 


Bone mineral density (GWAS Catalogue) 


26 


23 


4 


1.08E-05 


mRNA export variant 


disease by infectious agent (DGA) 


4 


6 


2 


1.09E-05 



«! indicates the number of orthologues in C. elegans (Ce) with phenotype!, n 2 the number in H. sapiens (Hs) with phenotype 2 and k the number in 
both sets. The significance of each phenolog is assessed by the hypergeometric probability (P-value). 



missing information can be inferred and candidate genes 
can be selected via model organisms. 

The approach of WormQTL HD is conceptually similar 
to that described by Smedley et al. (47). They created an 
automated method called PhenoDigm to provide evidence 
about gene-disease associations by analysing phenotypic 
information. In their case, phenotypes consist of a collec- 
tion of ontology terms, which are aligned and scored to 
derive an overall phenotype-similarity score. Using this 
method, known gene-phenotype associations in model 
organisms (mouse, zebrafish) can be transferred to other 
organisms such as man, and help us to understand the 
genetic cause of disease. This method works best when 
the model organism is physiologically close to man and 
has comparable classical phenotypes. It would 
therefore be less useful for C. elegans. However, 
combining the molecular (WormQTL HD ) and phenotyp- 
ical (PhenoDigm) approaches may result in a very 
powerful tool to discover novel gene-disease associations 
in man, especially when using physiologically close model 
organisms. 

We plan to further develop the WormQTL HD data and 
toolset. There might be more ways in which researchers 
would like to search through the large amounts of data, 
for example, based on custom lists of gene identifiers, or 
by combining tools such as finding QTLs within specific 
regions. The QTL plots could be improved or replaced 
with interactive graphs that are more informative and 
would allow the users to continue 'drilling down' in the 
data instead of returning to the home page for a new 
analysis with a different tool. Furthermore, we envisage 
close integration with other data sources and tools such as 
WormNet, R/qtl and GO Enrichment to provide even 
more biological context and analytical tools for the user. 

Our new database makes this data attractive and 
easy-to-use for an even wider community of quantitative 
geneticists working on worms and man. We are committed 
to maintaining the data and software in the future and 
invite the community to add and share their new data 
and ideas. 



SUPPLEMENTARY DATA 

Supplementary Data are available at NAR Online. 
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